Data Cleaning
Data Cleaning
Data Cleaning
1. Reshape Q1_B_1
This dataset contains data of animals in household. We will reshape it so that each row should be a Household. We do that in order to merge with other datasets. Replace values if need it. (NA, -1)
1.1 Reshaping from long to Wide
We need one line for each household.
So, we will use reshape command to say: “Reshape the database taking in to account that i want to make the database wider, having just one line for each household (FSN) and replicating the variables count, dist, indor, daysin for every animal that the household had”
animals <- reshape(Q1_B_1,direction = "wide", timevar ="anim", idvar ="FSN", v.names = c("count","dist","indor", "daysin"), sep = "_")
str(animals)'data.frame': 7147 obs. of 29 variables:
$ FSN : int 45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
$ count_Goa : int 3 NA NA 2 NA 7 1 3 3 NA ...
$ dist_Goa : int 5 NA NA 5 NA 0 5 15 0 NA ...
$ indor_Goa : int 1 NA NA 1 NA 1 1 1 1 NA ...
$ daysin_Goa: int 120 NA NA 180 NA 360 150 90 90 NA ...
$ count_Pou : int 1 NA NA NA NA NA NA NA NA 2 ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA 0 ...
$ indor_Pou : int 1 NA NA NA NA NA NA NA NA 1 ...
$ daysin_Pou: int 365 NA NA NA NA NA NA NA NA 360 ...
$ count_Buf : int NA 1 NA NA NA NA NA NA NA NA ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : int NA 0 NA NA NA NA NA NA NA NA ...
$ daysin_Buf: int NA -1 NA NA NA NA NA NA NA NA ...
$ count_Cow : int NA NA 1 NA 2 NA 3 NA NA NA ...
$ dist_Cow : int NA NA 3 NA 0 NA 15 NA NA NA ...
$ indor_Cow : int NA NA 1 NA 1 NA 0 NA NA NA ...
$ daysin_Cow: int NA NA 210 NA 360 NA -1 NA NA NA ...
$ count_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ daysin_Pig: int NA NA NA NA NA NA NA NA NA NA ...
$ count_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ daysin_Dog: int NA NA NA NA NA NA NA NA NA NA ...
$ count_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ daysin_Oth: int NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
..$ timevar: chr "anim"
..$ idvar : chr "FSN"
..$ times : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...
[1] "data.frame"
We observe:
That now we have less observations (7147)(before was 9029 obs)
There are a lot of NA (households than doesn’t have that animal)
There are also some -1 values
1.2 Replacing innnecesary values
We want to replace the NA and the -1 to Zero in all columns
[1] "FSN" "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa"
[6] "count_Pou" "dist_Pou" "indor_Pou" "daysin_Pou" "count_Buf"
[11] "dist_Buf" "indor_Buf" "daysin_Buf" "count_Cow" "dist_Cow"
[16] "indor_Cow" "daysin_Cow" "count_Pig" "dist_Pig" "indor_Pig"
[21] "daysin_Pig" "count_Dog" "dist_Dog" "indor_Dog" "daysin_Dog"
[26] "count_Oth" "dist_Oth" "indor_Oth" "daysin_Oth"
#str(animals)
#View(animals)
animals$indor_Pou[is.na(animals$indor_Pou)] <- 0
animals$indor_Pig[is.na(animals$indor_Pig)] <- 0
animals$indor_Oth[is.na(animals$indor_Oth)] <- 0
animals$indor_Goa[is.na(animals$indor_Goa)] <- 0
animals$indor_Dog[is.na(animals$indor_Dog)] <- 0
animals$indor_Cow[is.na(animals$indor_Cow)] <- 0
animals$indor_Buf[is.na(animals$indor_Buf)] <- 0
animals$daysin_Pou[is.na(animals$daysin_Pou) | animals$daysin_Pou == -1] <- 0
animals$daysin_Pig[is.na(animals$daysin_Pig) | animals$daysin_Pig == -1] <- 0
animals$daysin_Oth[is.na(animals$daysin_Oth) | animals$daysin_Oth == -1] <- 0
animals$daysin_Goa[is.na(animals$daysin_Goa) | animals$daysin_Goa == -1] <- 0
animals$daysin_Dog[is.na(animals$daysin_Dog) | animals$daysin_Dog == -1] <- 0
animals$daysin_Cow[is.na(animals$daysin_Cow) | animals$daysin_Cow == -1] <- 0
animals$daysin_Buf[is.na(animals$daysin_Buf) | animals$daysin_Buf == -1] <- 0
animals$count_Pou[is.na(animals$count_Pou)| animals$count_Pou == -1] <- 0
animals$count_Pig[is.na(animals$count_Pig)| animals$count_Pig == -1] <- 0
animals$count_Oth[is.na(animals$count_Oth)| animals$count_Oth == -1] <- 0
animals$count_Goa[is.na(animals$count_Goa)| animals$count_Goa == -1] <- 0
animals$count_Dog[is.na(animals$count_Dog)| animals$count_Dog == -1] <- 0
animals$count_Cow[is.na(animals$count_Cow)| animals$count_Cow == -1] <- 0
animals$count_Buf[is.na(animals$count_Buf)| animals$count_Buf == -1] <- 0
str(animals)'data.frame': 7147 obs. of 29 variables:
$ FSN : int 45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
$ count_Goa : num 3 0 0 2 0 7 1 3 3 0 ...
$ dist_Goa : int 5 NA NA 5 NA 0 5 15 0 NA ...
$ indor_Goa : num 1 0 0 1 0 1 1 1 1 0 ...
$ daysin_Goa: num 120 0 0 180 0 360 150 90 90 0 ...
$ count_Pou : num 1 0 0 0 0 0 0 0 0 2 ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA 0 ...
$ indor_Pou : num 1 0 0 0 0 0 0 0 0 1 ...
$ daysin_Pou: num 365 0 0 0 0 0 0 0 0 360 ...
$ count_Buf : num 0 1 0 0 0 0 0 0 0 0 ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Buf: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Cow : num 0 0 1 0 2 0 3 0 0 0 ...
$ dist_Cow : int NA NA 3 NA 0 NA 15 NA NA NA ...
$ indor_Cow : num 0 0 1 0 1 0 0 0 0 0 ...
$ daysin_Cow: num 0 0 210 0 360 0 0 0 0 0 ...
$ count_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Pig: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Dog: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Oth: num 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
..$ timevar: chr "anim"
..$ idvar : chr "FSN"
..$ times : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...
2. Create an asset index
This index should include information of consumer goods, as well if having a bovine animal and if having a brick wall household.
2.1. Select consumer goods
Select the consumer goods (from Q1_B_106 dataset) that will conform the asset.
2.1.1. Importing the CSV database under the name of Q1_B_106, with “,” as separator, and with “.” as decimal:
2.1.2. Seeing the structure and values of the dataset
'data.frame': 13377 obs. of 25 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ Radio : int 1 1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ CD_Player : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ BW_Television : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Color_Television: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Video_DVD_Player: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Mobile : int -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
$ Non_Mobile_Phone: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Refrigerator : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Iron : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Sewing_Machine : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Watch : int -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
$ Pressure_Cooker : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Chairs : int 1 -1 -1 -1 1 1 -1 -1 2 -1 ...
$ Sofas : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Tables : int -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
$ Cot_Bed : int 2 2 3 -1 2 1 2 1 2 -1 ...
$ Cupboards : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Bicycle : int 1 1 1 -1 -1 2 -1 -1 1 -1 ...
$ Motor_Cycle : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Animal_Draw_Cart: int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Car : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Tractor : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Computer : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electric_Fan : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
According to dictionary
-1 : Missing or not available
0, 1, 2, 3 : Number of items they have
2.1.3. Checking the summary of the variables
FSN Radio CD_Player BW_Television
Min. :45001 Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
1st Qu.:48527 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000
Median :52032 Median :-1.0000 Median :-1.0000 Median :-1.0000
Mean :52110 Mean :-0.7555 Mean :-0.9517 Mean :-0.9015
3rd Qu.:55671 3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.:-1.0000
Max. :59606 Max. : 6.0000 Max. : 3.0000 Max. : 2.0000
Color_Television Video_DVD_Player Mobile Non_Mobile_Phone
Min. :-1.0000 Min. :-1.0000 Min. :-1.00000 Min. :-1.0000
1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.:-1.0000
Median :-1.0000 Median :-1.0000 Median :-1.00000 Median :-1.0000
Mean :-0.9217 Mean :-0.9785 Mean : 0.01323 Mean :-0.9885
3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.: 1.00000 3rd Qu.:-1.0000
Max. : 2.0000 Max. : 1.0000 Max. :10.00000 Max. : 1.0000
Refrigerator Iron Sewing_Machine Watch
Min. :-1.0000 Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000
Median :-1.0000 Median :-1.0000 Median :-1.0000 Median :-1.0000
Mean :-0.9937 Mean :-0.9415 Mean :-0.8969 Mean :-0.3998
3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.: 1.0000
Max. : 1.0000 Max. : 4.0000 Max. : 3.0000 Max. :12.0000
Pressure_Cooker Chairs Sofas Tables
Min. :-1.0000 Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000
Median :-1.0000 Median :-1.0000 Median :-1.0000 Median :-1.0000
Mean :-0.8435 Mean : 0.4572 Mean :-0.9602 Mean :-0.2501
3rd Qu.:-1.0000 3rd Qu.: 2.0000 3rd Qu.:-1.0000 3rd Qu.: 1.0000
Max. : 5.0000 Max. :40.0000 Max. : 8.0000 Max. :10.0000
Cot_Bed Cupboards Bicycle Motor_Cycle
Min. :-1.000 Min. :-1.0000 Min. :-1.00000 Min. :-1.0000
1st Qu.: 2.000 1st Qu.:-1.0000 1st Qu.:-1.00000 1st Qu.:-1.0000
Median : 2.000 Median :-1.0000 Median : 1.00000 Median :-1.0000
Mean : 2.599 Mean :-0.9624 Mean : 0.09793 Mean :-0.8656
3rd Qu.: 3.000 3rd Qu.:-1.0000 3rd Qu.: 1.00000 3rd Qu.:-1.0000
Max. :24.000 Max. : 7.0000 Max. : 6.00000 Max. : 5.0000
Animal_Draw_Cart Car Tractor Computer
Min. :-1.0000 Min. :-1.0000 Min. :-1.0000 Min. :-1.0000
1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000 1st Qu.:-1.0000
Median :-1.0000 Median :-1.0000 Median :-1.0000 Median :-1.0000
Mean :-0.9919 Mean :-0.9938 Mean :-0.9874 Mean :-0.9982
3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.:-1.0000 3rd Qu.:-1.0000
Max. : 1.0000 Max. : 2.0000 Max. : 3.0000 Max. : 1.0000
Electric_Fan
Min. :-1.0000
1st Qu.:-1.0000
Median :-1.0000
Mean :-0.6872
3rd Qu.:-1.0000
Max. :11.0000
Example of what we can see: The maximum number of radios in a household is 6
2.1.4. Transform to dichotomous variables
We need that all ‘-1’ be converting to ‘0’.
and all value more than ‘≥ 1’ be converting to ‘1’
#We can do it by using "if else":
Q1_B_106$own_Radio <- ifelse(Q1_B_106$Radio>0,1,0)
#But other way is:
#Q1_B_106$own_Radio <- as.numeric(Q1_B_106$Radio > 0)
#Ask to create a variable called "own_Radio" only TRUE when "Radio" is more than 0 and converting that into a number (as.numeric) (1:true,0:false)
#Checking if it works
table(Q1_B_106$own_Radio, Q1_B_106$Radio, useNA = "always")
-1 1 2 3 6 <NA>
0 11753 0 0 0 0 0
1 0 1608 12 3 1 0
<NA> 0 0 0 0 0 0
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.1214 0.0000 1.0000
As a result we can see that: 12.14% of households have at least one radio. Remember: When we only have 0 and 1, The mean of the variable is the proportion of individuals having the exposure.
We need to do that for every variable
Q1_B_106$own_Radio <- ifelse(Q1_B_106$Radio>0,1,0)
Q1_B_106$own_CD_Player <- ifelse(Q1_B_106$CD_Player>0,1,0)
Q1_B_106$own_BW_Television <- ifelse(Q1_B_106$BW_Television>0,1,0)
Q1_B_106$own_Color_Television <- ifelse(Q1_B_106$Color_Television>0,1,0)
Q1_B_106$own_Video_DVD_Player <- ifelse(Q1_B_106$Video_DVD_Player>0,1,0)
Q1_B_106$own_Mobile <- ifelse(Q1_B_106$Mobile>0,1,0)
Q1_B_106$own_Non_Mobile_Phone <- ifelse(Q1_B_106$Non_Mobile_Phone>0,1,0)
Q1_B_106$own_Refrigerator <- ifelse(Q1_B_106$Refrigerator>0,1,0)
Q1_B_106$own_Iron <- ifelse(Q1_B_106$Iron>0,1,0)
Q1_B_106$own_Sewing_Machine <- ifelse(Q1_B_106$Sewing_Machine>0,1,0)
Q1_B_106$own_Watch <- ifelse(Q1_B_106$Watch>0,1,0)
Q1_B_106$own_Pressure_Cooker <- ifelse(Q1_B_106$Pressure_Cooker>0,1,0)
Q1_B_106$own_Chairs <- ifelse(Q1_B_106$Chairs>0,1,0)
Q1_B_106$own_Sofas <- ifelse(Q1_B_106$Sofas>0,1,0)
Q1_B_106$own_Tables <- ifelse(Q1_B_106$Tables>0,1,0)
Q1_B_106$own_Cot_Bed <- ifelse(Q1_B_106$Cot_Bed>0,1,0)
Q1_B_106$own_Cupboards <- ifelse(Q1_B_106$Cupboards>0,1,0)
Q1_B_106$own_Bicycle <- ifelse(Q1_B_106$Bicycle>0,1,0)
Q1_B_106$own_Motor_Cycle <- ifelse(Q1_B_106$Motor_Cycle>0,1,0)
Q1_B_106$own_Animal_Draw_Cart <- ifelse(Q1_B_106$Animal_Draw_Cart>0,1,0)
Q1_B_106$own_Car <- ifelse(Q1_B_106$Car>0,1,0)
Q1_B_106$own_Tractor <- ifelse(Q1_B_106$Tractor>0,1,0)
Q1_B_106$own_Computer <- ifelse(Q1_B_106$Computer>0,1,0)
Q1_B_106$own_Electric_Fan <- ifelse(Q1_B_106$Electric_Fan>0,1,0)2.1.5. Now we will make a subset containing only the ‘own’ variables and FSN
'data.frame': 13377 obs. of 25 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ own_Radio : num 1 1 0 0 0 0 0 0 0 0 ...
$ own_CD_Player : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_BW_Television : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Color_Television: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Video_DVD_Player: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Mobile : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Non_Mobile_Phone: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Refrigerator : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Iron : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Sewing_Machine : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Watch : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Pressure_Cooker : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Chairs : num 1 0 0 0 1 1 0 0 1 0 ...
$ own_Sofas : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Tables : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Cot_Bed : num 1 1 1 0 1 1 1 1 1 0 ...
$ own_Cupboards : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Bicycle : num 1 1 1 0 0 1 0 0 1 0 ...
$ own_Motor_Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Animal_Draw_Cart: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Car : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Tractor : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Computer : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Electric_Fan : num 0 0 0 0 0 0 0 0 0 0 ...
2.1.6. Now we will focus only in those variables whose mean have a value between 5-95% (0.05-0.95)
why?: Because We are going to exclude the variables that nobody has (less than 5%) and the variables that all people has (more than 95%)
#summary(assets)
round(sapply(assets, FUN=mean),3) #To display the mean of each variable with 3 decimals FSN own_Radio own_CD_Player
52110.000 0.121 0.024
own_BW_Television own_Color_Television own_Video_DVD_Player
0.049 0.039 0.011
own_Mobile own_Non_Mobile_Phone own_Refrigerator
0.476 0.006 0.003
own_Iron own_Sewing_Machine own_Watch
0.028 0.051 0.264
own_Pressure_Cooker own_Chairs own_Sofas
0.072 0.429 0.017
own_Tables own_Cot_Bed own_Cupboards
0.329 0.968 0.013
own_Bicycle own_Motor_Cycle own_Animal_Draw_Cart
0.524 0.065 0.004
own_Car own_Tractor own_Computer
0.003 0.006 0.001
own_Electric_Fan
0.110
Besides FSN, 10 variables have a value between 5-95% (0.05-0.95)
2.1.7. We are going to create a new subset (assets2) with only in those variables whose mean have a value between 5-95% (0.05-0.95)
assets2 <- subset(assets, select = c(FSN, own_Radio,own_Mobile, own_Sewing_Machine, own_Watch, own_Pressure_Cooker, own_Chairs, own_Tables, own_Bicycle, own_Motor_Cycle, own_Electric_Fan))
str(assets2)'data.frame': 13377 obs. of 11 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ own_Radio : num 1 1 0 0 0 0 0 0 0 0 ...
$ own_Mobile : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Sewing_Machine : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Watch : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Pressure_Cooker: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Chairs : num 1 0 0 0 1 1 0 0 1 0 ...
$ own_Tables : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Bicycle : num 1 1 1 0 0 1 0 0 1 0 ...
$ own_Motor_Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Electric_Fan : num 0 0 0 0 0 0 0 0 0 0 ...
2.2. Create own_bovine variable
Use the previously created “animals” dataset to create a variable of owning a bovine animal.
The researchers noticed that having a “bovines animals” variable is important, so we need to incorporate it. The only problem is that the animals information is in another dataset.
‘own_bov’ for each household = whether or not it owns cows or buffaloes
hlp1 <- subset(animals, select=c(FSN, count_Cow, count_Buf))
hlp1$own_bov <- NA
hlp1$own_bov[hlp1$count_Cow== 0 & hlp1$count_Buf==0] <- 0
hlp1$own_bov[hlp1$count_Cow> 0 | hlp1$count_Buf>0] <- 1
str(hlp1)'data.frame': 7147 obs. of 4 variables:
$ FSN : int 45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
$ count_Cow: num 0 0 1 0 2 0 3 0 0 0 ...
$ count_Buf: num 0 1 0 0 0 0 0 0 0 0 ...
$ own_bov : num 0 1 1 0 1 0 1 0 0 0 ...
2.3. Create brickwall variable
Use the Q1_B dataset to create a variable if the household have brickwalls.
The researchers also noticed that having a “brick_wall” variable is important, so we need to incorporate it. The only problem is that the brick_wall information is in another dataset.
2.3.1. Opening Q1_B dataset
2.3.2. Checking the variable Wall_Material in the Q1_B dataset
According the code list:
6= Other
163= Grass
163= Bamboo
164, 165, 166= Brick
So we only need those who had the value 164 or 165 in the Wall_Material variable
2.3.3. I only need “FSN” and “WallMaterial” variables
2.3.4. Creating a new variable called brickwall according to what I need
2.4. Merging datasets
Merge all this datasets in order to create the asset index
2.4.1. Merging hlp1 and hlp2
'data.frame': 7147 obs. of 4 variables:
$ FSN : int 45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
$ count_Cow: num 0 0 1 0 2 0 3 0 0 0 ...
$ count_Buf: num 0 1 0 0 0 0 0 0 0 0 ...
$ own_bov : num 0 1 1 0 1 0 1 0 0 0 ...
'data.frame': 13377 obs. of 3 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Wall_Material: num 164 164 164 162 163 163 162 164 164 162 ...
$ brick_wall : num 1 1 1 0 0 0 0 1 1 0 ...
'data.frame': 13377 obs. of 6 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Wall_Material: num 164 164 164 162 163 163 162 164 164 162 ...
$ brick_wall : num 1 1 1 0 0 0 0 1 1 0 ...
$ count_Cow : num 0 0 1 0 NA 2 NA NA 0 NA ...
$ count_Buf : num 0 1 0 0 NA 0 NA NA 0 NA ...
$ own_bov : num 0 1 1 0 NA 1 NA NA 0 NA ...
we Dont need countCow, count_Buf or Wall_Material
'data.frame': 13377 obs. of 3 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ brick_wall: num 1 1 1 0 0 0 0 1 1 0 ...
$ own_bov : num 0 1 1 0 NA 1 NA NA 0 NA ...
2.4.2. Merging hlp2 to assets2
'data.frame': 13377 obs. of 3 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ brick_wall: num 1 1 1 0 0 0 0 1 1 0 ...
$ own_bov : num 0 1 1 0 NA 1 NA NA 0 NA ...
'data.frame': 13377 obs. of 11 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ own_Radio : num 1 1 0 0 0 0 0 0 0 0 ...
$ own_Mobile : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Sewing_Machine : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Watch : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Pressure_Cooker: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Chairs : num 1 0 0 0 1 1 0 0 1 0 ...
$ own_Tables : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Bicycle : num 1 1 1 0 0 1 0 0 1 0 ...
$ own_Motor_Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Electric_Fan : num 0 0 0 0 0 0 0 0 0 0 ...
'data.frame': 13377 obs. of 13 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ own_Radio : num 1 1 0 0 0 0 0 0 0 0 ...
$ own_Mobile : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Sewing_Machine : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Watch : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Pressure_Cooker: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Chairs : num 1 0 0 0 1 1 0 0 1 0 ...
$ own_Tables : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Bicycle : num 1 1 1 0 0 1 0 0 1 0 ...
$ own_Motor_Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Electric_Fan : num 0 0 0 0 0 0 0 0 0 0 ...
$ brick_wall : num 1 1 1 0 0 0 0 1 1 0 ...
$ own_bov : num 0 1 1 0 NA 1 NA NA 0 NA ...
2.4.3. Check NA values
The merging creates NA values (ex: in own_bov), but that values are because that household doesnt have a bovine animal, so it should be 0
Checking NA
0 1
2816 4331
0 1 <NA>
2816 4331 6230
0 1 <NA>
2816 4331 6230
'data.frame': 13377 obs. of 13 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ own_Radio : num 1 1 0 0 0 0 0 0 0 0 ...
$ own_Mobile : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Sewing_Machine : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Watch : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Pressure_Cooker: num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Chairs : num 1 0 0 0 1 1 0 0 1 0 ...
$ own_Tables : num 0 0 0 0 0 1 0 0 0 0 ...
$ own_Bicycle : num 1 1 1 0 0 1 0 0 1 0 ...
$ own_Motor_Cycle : num 0 0 0 0 0 0 0 0 0 0 ...
$ own_Electric_Fan : num 0 0 0 0 0 0 0 0 0 0 ...
$ brick_wall : num 1 1 1 0 0 0 0 1 1 0 ...
$ own_bov : num 0 1 1 0 NA 1 NA NA 0 NA ...
2.4.4. Replacing NA
2.5. Use PCA to create asset_index
Create the asset index using the Principle component analysis (PCA) and then categorize it in 5 quintiles of wealth.
Principal Component Analysis (PCA) is a statistical technique that is used to transform a large number of variables into a smaller number (or just 1) new variable. Is often used for making scores or index.
When we do a score/index we tend to put equal weight to every variable because we think that every variable is equal important. But is that true? Usually no. So in Principle component analysis we try to put a weight to every variable depending the variance. It is useful to also put in in order to know which first component we need to work. (Which variables are representatives and in what measure)
The idea behind PCA is to find the underlying patterns in the data using its variances. The analysis will create a first (principal) component, then a second, third, and so on. Each component captures a different aspect of the variation in the data, with the first component capturing the most variation, and subsequent components capturing progressively less.
Usually we use the first component and with that then create a weighted variable for each individual.
[1] "FSN" "own_Radio" "own_Mobile"
[4] "own_Sewing_Machine" "own_Watch" "own_Pressure_Cooker"
[7] "own_Chairs" "own_Tables" "own_Bicycle"
[10] "own_Motor_Cycle" "own_Electric_Fan" "brick_wall"
[13] "own_bov"
2.5.1. PCA command
2.5.2. Inspect the component loadings of Comp.1
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
own_Radio 0.235 0.106 0.288 0.506 0.724 0.208
own_Mobile 0.321 0.245 -0.106 0.173 -0.415 -0.498
own_Sewing_Machine 0.235 -0.274 0.302 0.125 -0.210 -0.799 0.203 -0.131
own_Watch 0.351 0.120 -0.199 0.427
own_Pressure_Cooker 0.310 -0.348 0.220 -0.195 0.415
own_Chairs 0.357 0.183 -0.334 -0.198 0.388
own_Tables 0.362 -0.333 -0.216 0.115 0.420
own_Bicycle 0.246 0.501 0.137 -0.150 -0.448
own_Motor_Cycle 0.296 -0.280 0.277 -0.205 0.309 -0.588
own_Electric_Fan 0.328 -0.299 -0.228 0.286 -0.159 0.101
brick_wall 0.215 -0.452 -0.540 0.563 -0.357
own_bov 0.523 0.485 -0.523 0.317 0.121
Comp.9 Comp.10 Comp.11 Comp.12
own_Radio
own_Mobile -0.289 0.508 -0.162
own_Sewing_Machine 0.140
own_Watch -0.676 -0.420
own_Pressure_Cooker 0.310 0.280 -0.573
own_Chairs 0.132 -0.711
own_Tables 0.147 0.696
own_Bicycle 0.516 -0.379 0.145
own_Motor_Cycle -0.479 -0.186
own_Electric_Fan 0.260 0.745
brick_wall
own_bov -0.225 0.196
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
Cumulative Var 0.083 0.167 0.250 0.333 0.417 0.500 0.583 0.667 0.750
Comp.10 Comp.11 Comp.12
SS loadings 1.000 1.000 1.000
Proportion Var 0.083 0.083 0.083
Cumulative Var 0.833 0.917 1.000
We have to choose one model (one component) to select their scores
2.5.3. Extract the component scores
Create a variable named PC1 in assets3 from the scores un pcamod
2.5.4. Check the quintiles
0% 20% 40% 60% 80% 100%
-2.11118368 -1.66100172 -0.98271146 -0.06987735 1.45281527 7.18120558
2.5.5. Create a new categorical variable ‘asset_index’ with values 1-5 by quintile
assets3$asset_index <- NA
assets3$asset_index[assets3$PC1 >= -2.11118368 & assets3$PC1 <=-1.66100172 ] <- "1"
assets3$asset_index[assets3$PC1 >-1.66100172 & assets3$PC1 <=-0.98271146 ] <- "2"
assets3$asset_index[assets3$PC1 >-0.98271146 & assets3$PC1 <=-0.06987735 ] <- "3"
assets3$asset_index[assets3$PC1 >-0.06987735 & assets3$PC1 <=1.45281527 ] <- "4"
assets3$asset_index[assets3$PC1 >1.45281527 & assets3$PC1 <=7.18120558 ] <- "5"2.5.6. Drop unnecessary variables by making a subset
Or final dataset (with the Asset index) only need to have FSN and asset_index
3. Merging all the databases
3.1. Final Merging
(Questionnaire_1 + asset index + animals + Q1_B + Q1_Screening)
3.1.1 Opening Questionnaire_1
'data.frame': 13377 obs. of 10 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
3.1.2 Merging with the assets
'data.frame': 13377 obs. of 11 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ asset_index : chr "4" "3" "2" "1" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
3.1.3 Merging with animals
'data.frame': 7147 obs. of 29 variables:
$ FSN : int 45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
$ count_Goa : num 3 0 0 2 0 7 1 3 3 0 ...
$ dist_Goa : int 5 NA NA 5 NA 0 5 15 0 NA ...
$ indor_Goa : num 1 0 0 1 0 1 1 1 1 0 ...
$ daysin_Goa: num 120 0 0 180 0 360 150 90 90 0 ...
$ count_Pou : num 1 0 0 0 0 0 0 0 0 2 ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA 0 ...
$ indor_Pou : num 1 0 0 0 0 0 0 0 0 1 ...
$ daysin_Pou: num 365 0 0 0 0 0 0 0 0 360 ...
$ count_Buf : num 0 1 0 0 0 0 0 0 0 0 ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Buf: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Cow : num 0 0 1 0 2 0 3 0 0 0 ...
$ dist_Cow : int NA NA 3 NA 0 NA 15 NA NA NA ...
$ indor_Cow : num 0 0 1 0 1 0 0 0 0 0 ...
$ daysin_Cow: num 0 0 210 0 360 0 0 0 0 0 ...
$ count_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Pig: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Dog: num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Oth: num 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
..$ timevar: chr "anim"
..$ idvar : chr "FSN"
..$ times : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...
'data.frame': 13377 obs. of 11 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ asset_index : chr "4" "3" "2" "1" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
Quest1_assets4_animals <- merge(Quest1_assets4, animals, all=TRUE, by = "FSN")
str(Quest1_assets4_animals)'data.frame': 13377 obs. of 39 variables:
$ FSN : int 45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
$ asset_index : chr "4" "3" "2" "1" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
$ count_Goa : num 3 0 0 2 NA 0 NA NA 7 NA ...
$ dist_Goa : int 5 NA NA 5 NA NA NA NA 0 NA ...
$ indor_Goa : num 1 0 0 1 NA 0 NA NA 1 NA ...
$ daysin_Goa : num 120 0 0 180 NA 0 NA NA 360 NA ...
$ count_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA NA ...
$ indor_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pou : num 365 0 0 0 NA 0 NA NA 0 NA ...
$ count_Buf : num 0 1 0 0 NA 0 NA NA 0 NA ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Cow : num 0 0 1 0 NA 2 NA NA 0 NA ...
$ dist_Cow : int NA NA 3 NA NA 0 NA NA NA NA ...
$ indor_Cow : num 0 0 1 0 NA 1 NA NA 0 NA ...
$ daysin_Cow : num 0 0 210 0 NA 360 NA NA 0 NA ...
$ count_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
3.1.4 Merging with Q1_B
Q1_B <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_B.csv", sep=",", dec= ".")
str(Q1_B)'data.frame': 13377 obs. of 40 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
#Merge
Quest1_assets4_animals_q1B <- merge(Q1_B, Quest1_assets4_animals, all=TRUE, by = "FSN")
str(Quest1_assets4_animals_q1B)'data.frame': 13377 obs. of 78 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other : chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
$ asset_index : chr "4" "3" "2" "1" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
$ count_Goa : num 3 0 0 2 NA 0 NA NA 7 NA ...
$ dist_Goa : int 5 NA NA 5 NA NA NA NA 0 NA ...
$ indor_Goa : num 1 0 0 1 NA 0 NA NA 1 NA ...
$ daysin_Goa : num 120 0 0 180 NA 0 NA NA 360 NA ...
$ count_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA NA ...
$ indor_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pou : num 365 0 0 0 NA 0 NA NA 0 NA ...
$ count_Buf : num 0 1 0 0 NA 0 NA NA 0 NA ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Cow : num 0 0 1 0 NA 2 NA NA 0 NA ...
$ dist_Cow : int NA NA 3 NA NA 0 NA NA NA NA ...
$ indor_Cow : num 0 0 1 0 NA 1 NA NA 0 NA ...
$ daysin_Cow : num 0 0 210 0 NA 360 NA NA 0 NA ...
$ count_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
3.1.5 Opening Q1_Screening.csv
Q1_Screening <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_Screening.csv", sep=",", dec= ".")
str(Q1_Screening)'data.frame': 81214 obs. of 10 variables:
$ FSN : int 45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
$ member_id : int 1 2 3 4 5 6 1 2 3 4 ...
$ member_age : int 50 48 25 22 0 11 35 33 17 15 ...
$ member_sex : int 2 3 2 3 2 2 2 3 2 2 ...
$ fever_gt_3_days : int 0 0 0 0 0 0 0 0 0 0 ...
$ suffered_vl_since_2nd_survey: int 0 0 0 0 0 0 0 0 0 0 ...
$ date_diagnosis : chr "" "" "" "" ...
$ treatment_place : chr "-1" "-1" "-1" "-1" ...
$ current_status : int 0 0 4 0 0 0 0 0 4 0 ...
$ datedis : int NA NA NA NA NA NA NA NA NA NA ...
3.1.6 Merging with the merged database of households
'data.frame': 13377 obs. of 78 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other : chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
$ asset_index : chr "4" "3" "2" "1" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 680 290 3390 181 3501 1401 2330 3750 2371 ...
$ household_head_age : int 50 35 60 35 40 40 60 23 25 40 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 3 ...
$ household_head_religion: int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste: chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
$ count_Goa : num 3 0 0 2 NA 0 NA NA 7 NA ...
$ dist_Goa : int 5 NA NA 5 NA NA NA NA 0 NA ...
$ indor_Goa : num 1 0 0 1 NA 0 NA NA 1 NA ...
$ daysin_Goa : num 120 0 0 180 NA 0 NA NA 360 NA ...
$ count_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pou : int 0 NA NA NA NA NA NA NA NA NA ...
$ indor_Pou : num 1 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pou : num 365 0 0 0 NA 0 NA NA 0 NA ...
$ count_Buf : num 0 1 0 0 NA 0 NA NA 0 NA ...
$ dist_Buf : int NA 4 NA NA NA NA NA NA NA NA ...
$ indor_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Buf : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Cow : num 0 0 1 0 NA 2 NA NA 0 NA ...
$ dist_Cow : int NA NA 3 NA NA 0 NA NA NA NA ...
$ indor_Cow : num 0 0 1 0 NA 1 NA NA 0 NA ...
$ daysin_Cow : num 0 0 210 0 NA 360 NA NA 0 NA ...
$ count_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Pig : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Dog : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ count_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
$ daysin_Oth : num 0 0 0 0 NA 0 NA NA 0 NA ...
'data.frame': 81214 obs. of 10 variables:
$ FSN : int 45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
$ member_id : int 1 2 3 4 5 6 1 2 3 4 ...
$ member_age : int 50 48 25 22 0 11 35 33 17 15 ...
$ member_sex : int 2 3 2 3 2 2 2 3 2 2 ...
$ fever_gt_3_days : int 0 0 0 0 0 0 0 0 0 0 ...
$ suffered_vl_since_2nd_survey: int 0 0 0 0 0 0 0 0 0 0 ...
$ date_diagnosis : chr "" "" "" "" ...
$ treatment_place : chr "-1" "-1" "-1" "-1" ...
$ current_status : int 0 0 4 0 0 0 0 0 4 0 ...
$ datedis : int NA NA NA NA NA NA NA NA NA NA ...
#Merge
Quest1_assets4_animals_q1B_Persons <- merge(Q1_Screening, Quest1_assets4_animals_q1B, all=TRUE, by = "FSN")
str(Quest1_assets4_animals_q1B_Persons)'data.frame': 81214 obs. of 87 variables:
$ FSN : int 45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
$ member_id : int 1 2 3 4 5 6 1 2 3 4 ...
$ member_age : int 50 48 25 22 0 11 35 33 17 15 ...
$ member_sex : int 2 3 2 3 2 2 2 3 2 2 ...
$ fever_gt_3_days : int 0 0 0 0 0 0 0 0 0 0 ...
$ suffered_vl_since_2nd_survey: int 0 0 0 0 0 0 0 0 0 0 ...
$ date_diagnosis : chr "" "" "" "" ...
$ treatment_place : chr "-1" "-1" "-1" "-1" ...
$ current_status : int 0 0 4 0 0 0 0 0 4 0 ...
$ datedis : int NA NA NA NA NA NA NA NA NA NA ...
$ ID : int 1 1 1 1 1 1 2 2 2 2 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other : chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 3 3 3 3 3 10 10 10 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 4 4 4 4 4 12 12 12 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 0 0 0 0 0 2 2 2 2 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 153 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 1 1 ...
$ Roof_Material : num 161 161 161 161 161 161 159 159 159 159 ...
$ Other_Roof_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 164 164 164 164 164 164 164 ...
$ Other_Wall_Material : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 1 1 1 1 1 1 1 ...
$ Granaries_in_HH : num 1 1 1 1 1 1 1 1 1 1 ...
$ Source_Drink_Water : num 167 167 167 167 167 167 167 167 167 167 ...
$ Other_Src_Drink_Wat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 2 2 2 2 2 2 2 2 ...
$ No_Sleeping_Rooms : num 2 2 2 2 2 2 2 2 2 2 ...
$ asset_index : chr "4" "4" "4" "4" ...
$ panchyat_id : int 2 2 2 2 2 2 2 2 2 2 ...
$ village_id : int 11 11 11 11 11 11 11 11 11 11 ...
$ ward_no : int 1 1 1 1 1 1 1 1 1 1 ...
$ household_no : int 21 21 21 21 21 21 680 680 680 680 ...
$ household_head_age : int 50 50 50 50 50 50 35 35 35 35 ...
$ household_head_sex : int 2 2 2 2 2 2 2 2 2 2 ...
$ household_head_religion : int 4 4 4 4 4 4 4 4 4 4 ...
$ household_head_caste : int 9 9 9 9 9 9 9 9 9 9 ...
$ household_head_subcaste : chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
$ count_Goa : num 3 3 3 3 3 3 0 0 0 0 ...
$ dist_Goa : int 5 5 5 5 5 5 NA NA NA NA ...
$ indor_Goa : num 1 1 1 1 1 1 0 0 0 0 ...
$ daysin_Goa : num 120 120 120 120 120 120 0 0 0 0 ...
$ count_Pou : num 1 1 1 1 1 1 0 0 0 0 ...
$ dist_Pou : int 0 0 0 0 0 0 NA NA NA NA ...
$ indor_Pou : num 1 1 1 1 1 1 0 0 0 0 ...
$ daysin_Pou : num 365 365 365 365 365 365 0 0 0 0 ...
$ count_Buf : num 0 0 0 0 0 0 1 1 1 1 ...
$ dist_Buf : int NA NA NA NA NA NA 4 4 4 4 ...
$ indor_Buf : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Buf : num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Cow : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Cow : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Cow : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Cow : num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Pig : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Pig : num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Dog : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Dog : num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ dist_Oth : int NA NA NA NA NA NA NA NA NA NA ...
$ indor_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
$ daysin_Oth : num 0 0 0 0 0 0 0 0 0 0 ...
3.2. Select variables of interest
For the final dataset
final_dataset <- subset(Quest1_assets4_animals_q1B_Persons, select=c(
FSN, asset_index, Bamboo_Tree, Banana_Tree, Cooking_Fuel, Floor_Material, Granaries_in_HH, household_head_subcaste, indor_Buf, indor_Cow, indor_Pou, indor_Goa, count_Cow, count_Buf, count_Goa, count_Pou, Is_Floor_Damp, member_age, member_sex, Neem_Tree, No_Mosquito_Net, Perm_Water_Body, Rice_Field, Roof_Material, Sprayed_2009, Sprayed_2010, suffered_vl_since_2nd_survey, Wall_Material, Windows_in_Room
))
str(final_dataset)'data.frame': 81214 obs. of 29 variables:
$ FSN : int 45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
$ asset_index : chr "4" "4" "4" "4" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 153 153 ...
$ Granaries_in_HH : num 1 1 1 1 1 1 1 1 1 1 ...
$ household_head_subcaste : chr "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
$ indor_Buf : num 0 0 0 0 0 0 0 0 0 0 ...
$ indor_Cow : num 0 0 0 0 0 0 0 0 0 0 ...
$ indor_Pou : num 1 1 1 1 1 1 0 0 0 0 ...
$ indor_Goa : num 1 1 1 1 1 1 0 0 0 0 ...
$ count_Cow : num 0 0 0 0 0 0 0 0 0 0 ...
$ count_Buf : num 0 0 0 0 0 0 1 1 1 1 ...
$ count_Goa : num 3 3 3 3 3 3 0 0 0 0 ...
$ count_Pou : num 1 1 1 1 1 1 0 0 0 0 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 1 1 ...
$ member_age : int 50 48 25 22 0 11 35 33 17 15 ...
$ member_sex : int 2 3 2 3 2 2 2 3 2 2 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Mosquito_Net : num 0 0 0 0 0 0 2 2 2 2 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Roof_Material : num 161 161 161 161 161 161 159 159 159 159 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ suffered_vl_since_2nd_survey: int 0 0 0 0 0 0 0 0 0 0 ...
$ Wall_Material : num 164 164 164 164 164 164 164 164 164 164 ...
$ Windows_in_Room : num 1 1 1 1 1 1 1 1 1 1 ...
We are going to save this final data set